Island Grammar-Based Parsing Using GLL and Tom
Identifieur interne : 001561 ( Main/Exploration ); précédent : 001560; suivant : 001562Island Grammar-Based Parsing Using GLL and Tom
Auteurs : Ali Afroozeh [Pays-Bas] ; Jean-Christophe Bach [France] ; Mark Van Den Brand [Pays-Bas] ; Adrian Johnstone [Royaume-Uni] ; Maarten Manders [Pays-Bas] ; Pierre-Etienne Moreau [France] ; Elizabeth Scott [Royaume-Uni]Source :
- Lecture Notes in Computer Science [ 0302-9743 ]
Abstract
Abstract: Extending a language by embedding within it another language presents significant parsing challenges, especially if the embedding is recursive. The composite grammar is likely to be nondeterministic as a result of tokens that are valid in both the host and the embedded language. In this paper we examine the challenges of embedding the Tom language into a variety of general-purpose high level languages. Tom provides syntax and semantics for advanced pattern matching and tree rewriting facilities. Embedded Tom constructs are translated into the host language by a preprocessor, the output of which is a composite program written purely in the host language. Tom implementations exist for Java, C, C#, Python and Caml. The current parser is complex and difficult to maintain. In this paper, we describe how Tom can be parsed using island grammars implemented with the Generalised LL (GLL) parsing algorithm. The grammar is, as might be expected, ambiguous. Extracting the correct derivation relies on our disambiguation strategy which is based on pattern matching within the parse forest. We describe different classes of ambiguity and propose patterns for resolving them.
Url:
DOI: 10.1007/978-3-642-36089-3_13
Affiliations:
- France, Pays-Bas, Royaume-Uni
- Angleterre, Grand Est, Grand Londres, Lorraine (région)
- Londres, Vandœuvre-lès-Nancy, Villers-lès-Nancy
- Université de Londres, Université de Lorraine
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 003564
- to stream Istex, to step Curation: 003522
- to stream Istex, to step Checkpoint: 000187
- to stream Main, to step Merge: 001573
- to stream Main, to step Curation: 001561
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Island Grammar-Based Parsing Using GLL and Tom</title>
<author><name sortKey="Afroozeh, Ali" sort="Afroozeh, Ali" uniqKey="Afroozeh A" first="Ali" last="Afroozeh">Ali Afroozeh</name>
</author>
<author><name sortKey="Bach, Jean Christophe" sort="Bach, Jean Christophe" uniqKey="Bach J" first="Jean-Christophe" last="Bach">Jean-Christophe Bach</name>
</author>
<author><name sortKey="Van Den Brand, Mark" sort="Van Den Brand, Mark" uniqKey="Van Den Brand M" first="Mark" last="Van Den Brand">Mark Van Den Brand</name>
</author>
<author><name sortKey="Johnstone, Adrian" sort="Johnstone, Adrian" uniqKey="Johnstone A" first="Adrian" last="Johnstone">Adrian Johnstone</name>
</author>
<author><name sortKey="Manders, Maarten" sort="Manders, Maarten" uniqKey="Manders M" first="Maarten" last="Manders">Maarten Manders</name>
</author>
<author><name sortKey="Moreau, Pierre Etienne" sort="Moreau, Pierre Etienne" uniqKey="Moreau P" first="Pierre-Etienne" last="Moreau">Pierre-Etienne Moreau</name>
</author>
<author><name sortKey="Scott, Elizabeth" sort="Scott, Elizabeth" uniqKey="Scott E" first="Elizabeth" last="Scott">Elizabeth Scott</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:E0F72B61DB77182F93E495438C84CA6C8C51FC1C</idno>
<date when="2013" year="2013">2013</date>
<idno type="doi">10.1007/978-3-642-36089-3_13</idno>
<idno type="url">https://api.istex.fr/ark:/67375/HCB-XB1KR35C-L/fulltext.pdf</idno>
<idno type="wicri:Area/Istex/Corpus">003564</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">003564</idno>
<idno type="wicri:Area/Istex/Curation">003522</idno>
<idno type="wicri:Area/Istex/Checkpoint">000187</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000187</idno>
<idno type="wicri:doubleKey">0302-9743:2013:Afroozeh A:island:grammar:based</idno>
<idno type="wicri:Area/Main/Merge">001573</idno>
<idno type="wicri:Area/Main/Curation">001561</idno>
<idno type="wicri:Area/Main/Exploration">001561</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Island Grammar-Based Parsing Using GLL and Tom</title>
<author><name sortKey="Afroozeh, Ali" sort="Afroozeh, Ali" uniqKey="Afroozeh A" first="Ali" last="Afroozeh">Ali Afroozeh</name>
<affiliation wicri:level="1"><country xml:lang="fr">Pays-Bas</country>
<wicri:regionArea>Eindhoven University of Technology, NL-5612 AZ, Eindhoven</wicri:regionArea>
<wicri:noRegion>Eindhoven</wicri:noRegion>
</affiliation>
<affiliation></affiliation>
</author>
<author><name sortKey="Bach, Jean Christophe" sort="Bach, Jean Christophe" uniqKey="Bach J" first="Jean-Christophe" last="Bach">Jean-Christophe Bach</name>
<affiliation wicri:level="3"><country xml:lang="fr">France</country>
<wicri:regionArea>Inria, F-54600, Villers-lès-Nancy</wicri:regionArea>
<placeName><region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Villers-lès-Nancy</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="4"><country xml:lang="fr">France</country>
<wicri:regionArea>LORIA, UMR 7503, Université de Lorraine, F-54500, Vandœuvre-lès-Nancy</wicri:regionArea>
<placeName><region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Vandœuvre-lès-Nancy</settlement>
</placeName>
<orgName type="university">Université de Lorraine</orgName>
</affiliation>
<affiliation wicri:level="3"><country xml:lang="fr">France</country>
<wicri:regionArea>LORIA, UMR 7503, CNRS, F-54500, Vandœuvre-lès-Nancy</wicri:regionArea>
<placeName><region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Vandœuvre-lès-Nancy</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">France</country>
</affiliation>
</author>
<author><name sortKey="Van Den Brand, Mark" sort="Van Den Brand, Mark" uniqKey="Van Den Brand M" first="Mark" last="Van Den Brand">Mark Van Den Brand</name>
<affiliation wicri:level="1"><country xml:lang="fr">Pays-Bas</country>
<wicri:regionArea>Eindhoven University of Technology, NL-5612 AZ, Eindhoven</wicri:regionArea>
<wicri:noRegion>Eindhoven</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Pays-Bas</country>
</affiliation>
</author>
<author><name sortKey="Johnstone, Adrian" sort="Johnstone, Adrian" uniqKey="Johnstone A" first="Adrian" last="Johnstone">Adrian Johnstone</name>
<affiliation wicri:level="4"><country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Royal Holloway, University of London, TW20 0EX, Surrey, Egham</wicri:regionArea>
<orgName type="university">Université de Londres</orgName>
<placeName><settlement type="city">Londres</settlement>
<region type="country">Angleterre</region>
<region type="région" nuts="1">Grand Londres</region>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Royaume-Uni</country>
</affiliation>
</author>
<author><name sortKey="Manders, Maarten" sort="Manders, Maarten" uniqKey="Manders M" first="Maarten" last="Manders">Maarten Manders</name>
<affiliation wicri:level="1"><country xml:lang="fr">Pays-Bas</country>
<wicri:regionArea>Eindhoven University of Technology, NL-5612 AZ, Eindhoven</wicri:regionArea>
<wicri:noRegion>Eindhoven</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Pays-Bas</country>
</affiliation>
</author>
<author><name sortKey="Moreau, Pierre Etienne" sort="Moreau, Pierre Etienne" uniqKey="Moreau P" first="Pierre-Etienne" last="Moreau">Pierre-Etienne Moreau</name>
<affiliation wicri:level="3"><country xml:lang="fr">France</country>
<wicri:regionArea>Inria, F-54600, Villers-lès-Nancy</wicri:regionArea>
<placeName><region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Villers-lès-Nancy</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="4"><country xml:lang="fr">France</country>
<wicri:regionArea>LORIA, UMR 7503, Université de Lorraine, F-54500, Vandœuvre-lès-Nancy</wicri:regionArea>
<placeName><region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Vandœuvre-lès-Nancy</settlement>
</placeName>
<orgName type="university">Université de Lorraine</orgName>
</affiliation>
<affiliation wicri:level="3"><country xml:lang="fr">France</country>
<wicri:regionArea>LORIA, UMR 7503, CNRS, F-54500, Vandœuvre-lès-Nancy</wicri:regionArea>
<placeName><region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Vandœuvre-lès-Nancy</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">France</country>
</affiliation>
</author>
<author><name sortKey="Scott, Elizabeth" sort="Scott, Elizabeth" uniqKey="Scott E" first="Elizabeth" last="Scott">Elizabeth Scott</name>
<affiliation wicri:level="4"><country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Royal Holloway, University of London, TW20 0EX, Surrey, Egham</wicri:regionArea>
<orgName type="university">Université de Londres</orgName>
<placeName><settlement type="city">Londres</settlement>
<region type="country">Angleterre</region>
<region type="région" nuts="1">Grand Londres</region>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Royaume-Uni</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s" type="main" xml:lang="en">Lecture Notes in Computer Science</title>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: Extending a language by embedding within it another language presents significant parsing challenges, especially if the embedding is recursive. The composite grammar is likely to be nondeterministic as a result of tokens that are valid in both the host and the embedded language. In this paper we examine the challenges of embedding the Tom language into a variety of general-purpose high level languages. Tom provides syntax and semantics for advanced pattern matching and tree rewriting facilities. Embedded Tom constructs are translated into the host language by a preprocessor, the output of which is a composite program written purely in the host language. Tom implementations exist for Java, C, C#, Python and Caml. The current parser is complex and difficult to maintain. In this paper, we describe how Tom can be parsed using island grammars implemented with the Generalised LL (GLL) parsing algorithm. The grammar is, as might be expected, ambiguous. Extracting the correct derivation relies on our disambiguation strategy which is based on pattern matching within the parse forest. We describe different classes of ambiguity and propose patterns for resolving them.</div>
</front>
</TEI>
<affiliations><list><country><li>France</li>
<li>Pays-Bas</li>
<li>Royaume-Uni</li>
</country>
<region><li>Angleterre</li>
<li>Grand Est</li>
<li>Grand Londres</li>
<li>Lorraine (région)</li>
</region>
<settlement><li>Londres</li>
<li>Vandœuvre-lès-Nancy</li>
<li>Villers-lès-Nancy</li>
</settlement>
<orgName><li>Université de Londres</li>
<li>Université de Lorraine</li>
</orgName>
</list>
<tree><country name="Pays-Bas"><noRegion><name sortKey="Afroozeh, Ali" sort="Afroozeh, Ali" uniqKey="Afroozeh A" first="Ali" last="Afroozeh">Ali Afroozeh</name>
</noRegion>
<name sortKey="Manders, Maarten" sort="Manders, Maarten" uniqKey="Manders M" first="Maarten" last="Manders">Maarten Manders</name>
<name sortKey="Manders, Maarten" sort="Manders, Maarten" uniqKey="Manders M" first="Maarten" last="Manders">Maarten Manders</name>
<name sortKey="Van Den Brand, Mark" sort="Van Den Brand, Mark" uniqKey="Van Den Brand M" first="Mark" last="Van Den Brand">Mark Van Den Brand</name>
<name sortKey="Van Den Brand, Mark" sort="Van Den Brand, Mark" uniqKey="Van Den Brand M" first="Mark" last="Van Den Brand">Mark Van Den Brand</name>
</country>
<country name="France"><region name="Grand Est"><name sortKey="Bach, Jean Christophe" sort="Bach, Jean Christophe" uniqKey="Bach J" first="Jean-Christophe" last="Bach">Jean-Christophe Bach</name>
</region>
<name sortKey="Bach, Jean Christophe" sort="Bach, Jean Christophe" uniqKey="Bach J" first="Jean-Christophe" last="Bach">Jean-Christophe Bach</name>
<name sortKey="Bach, Jean Christophe" sort="Bach, Jean Christophe" uniqKey="Bach J" first="Jean-Christophe" last="Bach">Jean-Christophe Bach</name>
<name sortKey="Bach, Jean Christophe" sort="Bach, Jean Christophe" uniqKey="Bach J" first="Jean-Christophe" last="Bach">Jean-Christophe Bach</name>
<name sortKey="Moreau, Pierre Etienne" sort="Moreau, Pierre Etienne" uniqKey="Moreau P" first="Pierre-Etienne" last="Moreau">Pierre-Etienne Moreau</name>
<name sortKey="Moreau, Pierre Etienne" sort="Moreau, Pierre Etienne" uniqKey="Moreau P" first="Pierre-Etienne" last="Moreau">Pierre-Etienne Moreau</name>
<name sortKey="Moreau, Pierre Etienne" sort="Moreau, Pierre Etienne" uniqKey="Moreau P" first="Pierre-Etienne" last="Moreau">Pierre-Etienne Moreau</name>
<name sortKey="Moreau, Pierre Etienne" sort="Moreau, Pierre Etienne" uniqKey="Moreau P" first="Pierre-Etienne" last="Moreau">Pierre-Etienne Moreau</name>
</country>
<country name="Royaume-Uni"><region name="Angleterre"><name sortKey="Johnstone, Adrian" sort="Johnstone, Adrian" uniqKey="Johnstone A" first="Adrian" last="Johnstone">Adrian Johnstone</name>
</region>
<name sortKey="Johnstone, Adrian" sort="Johnstone, Adrian" uniqKey="Johnstone A" first="Adrian" last="Johnstone">Adrian Johnstone</name>
<name sortKey="Scott, Elizabeth" sort="Scott, Elizabeth" uniqKey="Scott E" first="Elizabeth" last="Scott">Elizabeth Scott</name>
<name sortKey="Scott, Elizabeth" sort="Scott, Elizabeth" uniqKey="Scott E" first="Elizabeth" last="Scott">Elizabeth Scott</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001561 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001561 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/Lorraine |area= InforLorV4 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:E0F72B61DB77182F93E495438C84CA6C8C51FC1C |texte= Island Grammar-Based Parsing Using GLL and Tom }}
This area was generated with Dilib version V0.6.33. |